Automated Theory Formation in Bioinformatics

نویسنده

  • Simon Colton
چکیده

A theory learned by an inductive logic programming (ILP) system such as Progol [5] usually comprises a set of concepts, expressed as logic programs, which can be employed for a classification task. This classifying ability can, in turn, be used for prediction tasks. A scientific theory, however, comprises much more information: concepts; hypotheses relating concepts; explanations and empirical justifications of the hypotheses; representation schemes; experimental methodologies and so on. Working mainly in mathematics, we have used the HR system [1] to form theories about some objects of interest in a domain. For example, in group theory, where the objects are groups, HR invents concepts, makes conjectures about those concepts, and proves (some of) the conjectures using the Otter theorem prover [4]. Despite it’s history in mathematics, we have developed HR as a domain-independent machine learning program. In particular, the format for background information is very similar to that for Progol. Given this, we are currently exploring various possibilities for automated theory formation (ATF) using bioinformatics datasets. We describe here an application of HR to the mutagenesis data set [6] and suggest some advantages of ATF over ILP, some disadvantages, and some possibilities for the fruitful combination of the two techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nucleosome formation potential of eukaryotic DNA: calculation and promoters analysis

MOTIVATION A rapid growth in the number of genes with known sequences calls for developing automated tools for their classification and analysis. It became clear that nucleosome packaging of eukaryotic DNA is very important for gene functioning. Automated computer tools for characterization of nucleosome packaging density could be useful for studying of gene regulation and genome annotation. ...

متن کامل

Patikaweb: a Web interface for analyzing biological pathways through advanced querying and visualization

Patikaweb provides a Web interface for retrieving and analyzing biological pathways in the Patika database, which contains data integrated from various prominent public pathway databases. It features a user-friendly interface, dynamic visualization and automated layout, advanced graph-theoretic queries for extracting biologically important phenomena, local persistence capability and exporting f...

متن کامل

Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM

Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...

متن کامل

The Importance of α-CT and Salt bridges in the Formation of Insulin and its Receptor Complex by Computational Simulation

Insulin hormone is an important part of the endocrine system. It contains two polypeptide chains and plays a pivotal role in regulating carbohydrate metabolism. Insulin receptors (IR) located on cell surface interacts with insulin to control the intake of glucose. Although several studies have tried to clarify the interaction between insulin and its receptor, the mechanism of this interaction r...

متن کامل

A two stage model for Cell Formation Problem (CFP) considering the inter-cellular movements by AGVs

This paper addresses to the Cell Formation Problem (CFP) in which Automated Guided Vehicles (AGVs) have been employed to transfer the jobs which may need to visit one or more cells. Because of added constraints to problem such as AGVs’ conflict and excessive cessation on one place, it is possible that AGVs select the different paths from one cell to another over the time. This means that the ti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006